Fallback to use the CUDA primary context #189

madsbk · 2023-03-29T10:38:17Z

Fixes rapidsai/cudf#13019

cpp/include/kvikio/utils.hpp

…da_primary_context

wence-

Can we work through the context-getting logic, I am not sure it is right?

cpp/include/kvikio/utils.hpp

wence- · 2023-03-29T13:31:40Z

cpp/include/kvikio/utils.hpp

+  if (err != CUDA_ERROR_INVALID_VALUE) {
+    // If this isn't the case, we grab the current context
+    CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx));
+  } else {
+    CUDA_DRIVER_TRY(err);  // Check for errors from previous asynchronous launches
+  }


I think this logic is wrong? Let's try and work through it:

The options for the allocation are:

stream ordered: CU_POINTER_ATTRIBUTE_CONTEXT will return CUDA_SUCCESS and put ctx = nullptr

not stream ordered: CU_POINTER_ATTRIBUTE_CONTEXT will return CUDA_SUCCESS and put ctx = some_context

not a UVM allocation: CU_POINTER_ATTRIBUTE_CONTEXT will return CUDA_ERROR_INVALID_VALUE

Some other thing: some other cuda error

So if we hit case 3 or 4, we want to error, if we hit case 2, we want to return the context that the query returned, if we hit case 1 we need to do some more work: I think we can just use CU_POINTER_ATTRIBUTE_DEVICE_POINTER query and then do something based on that (either return a non-null current context or else build a primary context and return that).

So perhaps:

CUcontext ctx; // either the pointer is valid (CUDA_SUCCESS), or it isn't (some other error) CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute(&ctx, CU_POINTER_ATTRIBUTE_CONTEXT, dev_ptr)); if (ctx != nullptr) { // not a stream-ordered allocation, this context is fine. return ctx; } // stream-ordered allocation, let's see if the current device context exists and we can use it CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx)); if (ctx != nullptr) { CUdeviceptr target{}; auto const err = cudaAPI::instance().PointerGetAttribute(&target, CU_POINTER_ATTRIBUTE_DEVICE_POINTER, dev_ptr); if (err == CUDA_SUCCESS && target == dev_ptr) return ctx; } // Either no context, or the current context can't access the device pointer? int ord = get_device_ordinal_from_pointer(devPtr); _primary_contexts.try_emplate(ord, ord); return _primary_contexts.at(ord).ctx;

WDYT?

You are right, I messed up the equality checks and we don’t need to check the accessibility of devPtr when CU_POINTER_ATTRIBUTE_CONTEXT succeed. However, we must survive a CUDA_ERROR_INVALID_VALUE error.

cpp/include/kvikio/utils.hpp

Co-authored-by: Lawrence Mitchell <[email protected]>

…nto use_cuda_primary_context

wence- · 2023-03-29T15:16:41Z

cpp/include/kvikio/utils.hpp

+    CUcontext ctx;
+    const CUresult err =
+      cudaAPI::instance().PointerGetAttribute(&ctx, CU_POINTER_ATTRIBUTE_CONTEXT, dev_ptr);
+    if (err != CUDA_ERROR_INVALID_VALUE) {


Why do we fall through if the result is CUDA_ERROR_INVALID_VALUE? Ah, I guess if we don't have UVM, this might still be a device pointer?

We don't want to crash when getting CUDA_ERROR_INVALID_VALUE so we cannot use CUDA_DRIVER_TRY(). Instead, we want to get the current context (if any).

cpp/include/kvikio/utils.hpp

wence- · 2023-03-29T15:21:11Z

cpp/include/kvikio/utils.hpp

+      CUdeviceptr current_ctx_dev_ptr{};
+      CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute(
+        &current_ctx_dev_ptr, CU_POINTER_ATTRIBUTE_DEVICE_POINTER, dev_ptr));
+      if (current_ctx_dev_ptr != dev_ptr) { return ctx; }


I think I don't understand this check. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1g0c28ed0aff848042bc0533110e45820c says:

Returns in *data the device pointer value through which ptr may be accessed by kernels running in the current CUcontext. The type of data must be CUdeviceptr *.

If there exists no device pointer value through which kernels running in the current CUcontext may access ptr then CUDA_ERROR_INVALID_VALUE is returned.

If there is no current CUcontext then CUDA_ERROR_INVALID_CONTEXT is returned.

Except in the exceptional disjoint addressing cases discussed below, the value returned in *data will equal the input value ptr.

So my reading is that if the driver call was successful, then current_ctx_dev_ptr will either be the same as dev_ptr (usual case) or different (but still valid) in the corner cases where dev_ptr was allocated either by cuMemHostRegister or cuMemHostAlloc with CU_MEMHOSTALLOC_WRITECOMBINED.

In either case, the current context is good for accessing the pointer.

So I think:

Suggested change

if (current_ctx_dev_ptr != dev_ptr) { return ctx; }

if (current_ctx_dev_ptr == dev_ptr) { return ctx; }

If dev_ptr != current_ctx_dev_ptr, we cannot access dev_ptr using the current context right?

No, it means that the memory was allocated in one of the special ways listed above. The context is capable of accessing the data, but must do so through current_ctx_dev_ptr (not dev_ptr). Since this function is only asking "what context is good for accessing this pointer?" I think we should always return the ctx here.

wence-

Still one quibble about the case where we have a context and have queried the device pointer attribute.

Co-authored-by: Lawrence Mitchell <[email protected]>

madsbk · 2023-03-29T16:13:58Z

Marked the PR as a draft again. I will write some tests tomorrow before we merge

cpp/include/kvikio/utils.hpp

…nto use_cuda_primary_context

cpp/include/kvikio/utils.hpp

Co-authored-by: Lawrence Mitchell <[email protected]>

cpp/include/kvikio/utils.hpp

wence- · 2023-03-29T16:39:30Z

cpp/include/kvikio/utils.hpp

+    CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx));
+    if (ctx != nullptr) {
+      CUdeviceptr current_ctx_dev_ptr{};
+      CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute(


We could have CUDA_ERROR_INVALID_VALUE here (the current context is not set up for allowing us to access this pointer). Is that a recoverable error?

Should certainly handle other errors as failure.

wence- · 2023-03-29T17:00:49Z

cpp/include/kvikio/utils.hpp

+  // Finally, if we didn't find any usable context, we return the primary context of the
+  // device that owns `devPtr`. Notice, we use `_primary_contexts` to cache the primary
+  // context of each device.
+  int ordinal = get_device_ordinal_from_pointer(dev_ptr);


If dev_ptr was allocated with cuMemHostAlloc with CU_MEMHOSTALLOC_WRITECOMBINED, and is the "host address". It looks like a UVM address, but it is unmapped on the device, so cannot be used in a kernel, even with this primary context that we just made. In that case, the pointer that is usable on the device is the one obtained from querying CUDA_POINTER_ATTRIBUTE_DEVICE_POINTER.

After setting the primary context, I think if we go round and query CUDA_POINTER_ATTRIBUTE_DEVICE_POINTER again and find that the result is different from the input, then we get an error.

Maybe we should try and clarify some more with the cuda team.

wence-

Thanks, just some minor debugging leftovers to clean up I think

cpp/include/kvikio/utils.hpp

python/tests/test_basic_io.py

cpp/include/kvikio/utils.hpp

Co-authored-by: Lawrence Mitchell <[email protected]>

madsbk · 2023-03-30T12:23:22Z

Thanks for the help @wence- !

wence-

Thanks @madsbk! How shall we test that this works for the initial reported cudf bug?

madsbk · 2023-03-31T06:58:51Z

How shall we test that this works for the initial reported cudf bug?

In test_no_current_cuda_context(), we test the underlying problem of the reported cudf+dask bug.
But it might be too expensive in terms of CI resources to test the exact cudf code.
Since KvikIO is header-only, CI would have to build cudf every time.

wence- · 2023-03-31T09:02:20Z

How shall we test that this works for the initial reported cudf bug?

In test_no_current_cuda_context(), we test the underlying problem of the reported cudf+dask bug. But it might be too expensive in terms of CI resources to test the exact cudf code. Since KvikIO is header-only, CI would have to build cudf every time.

I was going to just build locally and check?

madsbk · 2023-03-31T09:33:18Z

I was going to just build locally and check?

Ahh sure, I have done that continually :)

To built cudf using this branch, add the KvikIO source in the build.sh script like:

diff --git a/build.sh b/build.sh
index bee66d819b..b6bcf5ce75 100755
--- a/build.sh
+++ b/build.sh
@@ -288,6 +288,7 @@ if buildAll || hasArg libcudf; then
           -DDISABLE_DEPRECATION_WARNINGS=${BUILD_DISABLE_DEPRECATION_WARNINGS} \
           -DCUDF_USE_PER_THREAD_DEFAULT_STREAM=${BUILD_PER_THREAD_DEFAULT_STREAM} \
           -DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
+          -DCPM_KvikIO_SOURCE=/home/mkristensen/repos/kvikio \
           ${EXTRA_CMAKE_ARGS}
 
     cd ${LIB_BUILD_DIR}

wence- · 2023-04-03T16:51:49Z

Can confirm this fixes the reported cudf issue

wence- · 2023-04-03T16:51:53Z

/merge

Implement get_context_from_pointer

83a6b65

madsbk added bug Something isn't working non-breaking Introduces a non-breaking change labels Mar 29, 2023

wence- reviewed Mar 29, 2023

View reviewed changes

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved

madsbk added 5 commits March 29, 2023 04:15

doc

d92ecc8

Merge branch 'branch-23.04' of github.com:rapidsai/kvikio into use_cu…

327a34d

…da_primary_context

get_context_from_pointer(): also checks CU_POINTER_ATTRIBUTE_CONTEXT

5e122bd

doc

09b67d0

doc

073360f

madsbk marked this pull request as ready for review March 29, 2023 12:18

madsbk requested a review from wence- March 29, 2023 12:18

wence- reviewed Mar 29, 2023

View reviewed changes

madsbk and others added 4 commits March 29, 2023 16:37

doc

ed409a2

Co-authored-by: Lawrence Mitchell <[email protected]>

fixing get_context_from_pointer checks

b21371f

Merge branch 'use_cuda_primary_context' of github.com:madsbk/kvikio i…

81924bb

…nto use_cuda_primary_context

clean up

2e4b8ad

madsbk requested a review from wence- March 29, 2023 14:49

wence- reviewed Mar 29, 2023

View reviewed changes

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved

wence- reviewed Mar 29, 2023

View reviewed changes

typo

5a58583

Co-authored-by: Lawrence Mitchell <[email protected]>

madsbk marked this pull request as draft March 29, 2023 16:12

wence- reviewed Mar 29, 2023

View reviewed changes

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved

madsbk added 2 commits March 29, 2023 09:23

CU_POINTER_ATTRIBUTE_CONTEXT returns ctx == null

1720ffb

Merge branch 'use_cuda_primary_context' of github.com:madsbk/kvikio i…

ee7bb21

…nto use_cuda_primary_context

wence- reviewed Mar 29, 2023

View reviewed changes

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved

Update cpp/include/kvikio/utils.hpp

07e6085

Co-authored-by: Lawrence Mitchell <[email protected]>

wence- reviewed Mar 29, 2023

View reviewed changes

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved

wence- reviewed Mar 29, 2023

View reviewed changes

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved

wence- reviewed Mar 29, 2023

View reviewed changes

rework

1600827

wence- reviewed Mar 30, 2023

View reviewed changes

madsbk and others added 3 commits March 30, 2023 13:43

Apply suggestions from code review

fd4847a

Co-authored-by: Lawrence Mitchell <[email protected]>

clean up

39d50f2

removed debug info

0aff7f5

madsbk marked this pull request as ready for review March 30, 2023 12:21

madsbk requested a review from wence- March 30, 2023 14:01

wence- approved these changes Mar 30, 2023

View reviewed changes

rapids-bot bot merged commit 5978055 into rapidsai:branch-23.04 Apr 3, 2023

madsbk deleted the use_cuda_primary_context branch April 11, 2023 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback to use the CUDA primary context #189

Fallback to use the CUDA primary context #189

madsbk commented Mar 29, 2023

wence- left a comment

wence- Mar 29, 2023

madsbk Mar 29, 2023

wence- Mar 29, 2023

madsbk Mar 29, 2023

wence- Mar 29, 2023 •

edited

Loading

madsbk Mar 29, 2023

wence- Mar 29, 2023

wence- left a comment

madsbk commented Mar 29, 2023

wence- Mar 29, 2023

wence- Mar 29, 2023

wence- Mar 29, 2023

wence- Mar 29, 2023

wence- Mar 29, 2023

wence- left a comment

madsbk commented Mar 30, 2023

wence- left a comment

madsbk commented Mar 31, 2023 •

edited

Loading

wence- commented Mar 31, 2023

madsbk commented Mar 31, 2023

wence- commented Apr 3, 2023

wence- commented Apr 3, 2023

	if (current_ctx_dev_ptr != dev_ptr) { return ctx; }
	if (current_ctx_dev_ptr == dev_ptr) { return ctx; }

Fallback to use the CUDA primary context #189

Fallback to use the CUDA primary context #189

Conversation

madsbk commented Mar 29, 2023

wence- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wence- Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

madsbk commented Mar 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

madsbk commented Mar 30, 2023

wence- left a comment

Choose a reason for hiding this comment

madsbk commented Mar 31, 2023 • edited Loading

wence- commented Mar 31, 2023

madsbk commented Mar 31, 2023

wence- commented Apr 3, 2023

wence- commented Apr 3, 2023

wence- Mar 29, 2023 •

edited

Loading

madsbk commented Mar 31, 2023 •

edited

Loading